Gene Onoltogy Analysis on day50 Ancestral

1. Environment Set Up

for (i in 1:length(params))
  print(paste('Parameter:', names(params)[i], ' - Value:', params[[i]], '- Class:', class(params[[i]])))
## [1] "Parameter: Dataset  - Value: CHD2_iPSCs_and_organoids_PublicRepo - Class: character"
## [1] "Parameter: SEFile  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day50/Output/Savings/day50CbO.SE_deseq2_AR.rds - Class: character"
## [1] "Parameter: DEAList  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day50/Output/Savings/day50CbO.DEAList_AR.rds - Class: character"
## [1] "Parameter: AR  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day50/Output/Savings/day50CbO.deseqARvsWT.rds - Class: character"
## [1] "Parameter: SavingFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/day50/Output/Savings/ - Class: character"
## [1] "Parameter: FiguresFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/day50/Output/Figures/ - Class: character"
## [1] "Parameter: FDRthr  - Value: 0.05 - Class: numeric"
## [1] "Parameter: logFCthr  - Value: 0.55 - Class: numeric"
## [1] "Parameter: TopGO  - Value: BP_MF_CC - Class: character"
## [1] "Parameter: GoEnTh  - Value: 1 - Class: numeric"
## [1] "Parameter: GoPvalTh  - Value: 0.05 - Class: numeric"
## [1] "Parameter: NbName  - Value: TopGO_day50_AR - Class: character"
## [1] "Parameter: SaveImages  - Value: FALSE - Class: logical"
  • Dataset: name of the dataset that is processed.
  • SE: input file containing differential expression results from DESeq2 (absolute path).
  • FDRthr: threshold on False Discovery Rate. Default 0.05.
  • logFCthr: threshold on False Discovery Rate. Default 1.5.
  • SavingFolder: directory where produced files will be written (absolute path). Default is getwd().
  • TopGO: string that specify the ontology domains to be analysed. Default BP and MF; also CC can be added.
library(RNASeqBulkExploratory)
library(SummarizedExperiment)
library(tidyr)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(topGO)
library(sechm)
library(ggplot2)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(cowplot)
library(data.table)
source('/group/testa/Users/oliviero.leonardi/myProjects/CHD2/BulkRNAseq/ContainerHome/CHD2_organoids/NoGradientBarplots.R')
Dataset <- params$Dataset
logFCthr <- params$logFCthr
FDRthr <- params$FDRthr

FdrTh <- FDRthr
logFcTh <- logFCthr

SavingFolder <- ifelse(is.null(params$SavingFolder), getwd(), params$SavingFolder)
FiguresFolder <- ifelse(is.null(params$FiguresFolder), getwd(), params$FiguresFolder)


if (dir.exists(SavingFolder) == FALSE) {
  dir.create(SavingFolder, recursive=TRUE)
}

2. Data Upload

  • Summarized Experiment object containing expression data used for DEA and gene and sample metadata
  • DEA object, containing results of the differential expression

2.1 Load Data from DEA

#SE object coming from DEA, but not containing specific contrast results
SE_DEA <- readRDS(params$SEFile)

SE_DEA <- SE_DEA[rowData(SE_DEA)$GeneName != '', ]
rownames(SE_DEA) <- rowData(SE_DEA)$GeneName

# List with differential expression results (all time-points)
DEA <- readRDS(params$DEAList)
colvector <- c("#5ec962", "#e95462", "#2c728e")
names(colvector) <- c('All', 'Up',  'Down')

2.2 Add DEA results to SE

if(! identical(rownames(SE_DEA), row.names(DEA$AR$res))){
  stop('Expression data in SE and results from differential espression analysis are inconsistent.')
}

SE_DEA <- mergeDeaSE(SE_DEA, DEA$AR$res, subsetRowDataCols=NULL,
                     logFcCol='log2FoldChange', FdrCol='padj') #specify
## Renaming " log2FoldChange " to "logFC"
## Renaming "  padj  " to "FDR"

17673 genes in 14 samples have been testes for differential expression.

The following number of genes are identified as differentially expressed:

  • FDR < 0.1: 3210 differentially expressed genes
  • FDR < 0.05: 2370 differentially expressed genes
  • FDR < 0.05 and FC > 1.5: 652 differentially expressed genes
  • FDR < 0.05 and FC > 2: 240 differentially expressed genes

Imposing a threshold of 0.55 on the Log2FC and 0.05 on the FDR (as specified in parameters), 2370 genes are selected: 2105 up-regulated genes and 2309 down-regulated genes.


3. RESULTS NAVIGATION: Interactive Table

An interactive table show the results for all DEGs (ranked according to FDR). A table of all DEGs can be downloaded from here.

From topTag I generate an interactive table for result interrogation with link to the Gene Cards. The table reports all the genes having a FDR < 0.05 and a Log2FC > 0.55 as absolute value, according to the threshold settings.

DEGsTable(SE_DEA, FdrTh=FdrTh, logFcTh=logFCthr, maxGenes=Inf, saveDEGs=FALSE)

4. RESULTS VISUALIZATION

4.1 Volcano plot

The results of the differential expression analysis are visualized by Volcano plot. An interactive version is included in the html (only genes with FDR < threshold), while a static version is saved.

plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = FALSE)
## Warning: Removed 3084 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 17560 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).


plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = TRUE)

4.2 Heatmap for significant genes

Heatmaps for DEGs, showing scaled vst values.

DEGs <- dplyr::filter(data.frame(rowData(SE_DEA)), FDR < FDRthr & abs(logFC) > log2(logFCthr))
ScaledCols <- c('darkblue', "purple","white","lightgoldenrod1", 'goldenrod1')
# sechm::sechm(SE_DEA, genes=DEGs$GeneName, assayName="vst", gaps_at="Genotype", show_rownames=FALSE,
#       top_annotation=c('Genotype'), hmcols=ScaledCols, show_colnames=TRUE,
#       do.scale=TRUE, breaks=0.85, column_title = "Scaled Vst Values")

5. TOPGO for Gene Ontology Enrichment analysis

Gene ontology enrichment analysis is performed on the set of 2370 genes using TopGO with Fisher statistics and weight01 algorithm.

For each specified domain of the ontology:

  • Enrichment analysis on all DEGs or splitted in down- and up-regulated

5.1 Selection of modulated genes and generation of gene vectors

I generate vectors for the gene universe, all modulated genes, up-regulated genes and down-regulated genes in the format required by TopGo.

GeneVectors <- topGOGeneVectors(SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr)
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1

Therefore:

  • universe genes: 17673 genes
  • modulated genes: 721 genes
  • down-regulated genes: 543 genes of interest
  • up-regulated genes: 178 genes of interest

Then I set parameters according to the gene ontology domains to be evaluated. By default, Biological Process and Molecular Function domains are interrogated.

BpEval <- ifelse(length(grep('BP', params$TopGO))!=0, TRUE, FALSE)
MfEval <- ifelse(length(grep('MF', params$TopGO))!=0, TRUE, FALSE)
CcEval <- ifelse(length(grep('CC', params$TopGO))!=0, TRUE, FALSE)

5.2 TopGO analysis: Biological Process

On the basis of the analysis settings, the enrichment for Biological Process IS performed.

Biological Process Analysis for ALL modulated genes: 721 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
BPannAR <- topGO::annFUN.org(whichOnto="BP", feasibleGenes=names(GeneVectors$DEGenes), 
                    mapping="org.Hs.eg.db", ID="symbol") %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResBPAllAR <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=BPannAR, ontology='BP', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='BPAllAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11741 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15403 GO terms and 35047 relations. )
## 
## Annotating nodes ...............
##  ( 13971 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 5426 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  9 nodes to be scored    (24 eliminated genes)
## 
##   Level 16:  20 nodes to be scored   (41 eliminated genes)
## 
##   Level 15:  43 nodes to be scored   (122 eliminated genes)
## 
##   Level 14:  75 nodes to be scored   (310 eliminated genes)
## 
##   Level 13:  134 nodes to be scored  (663 eliminated genes)
## 
##   Level 12:  232 nodes to be scored  (1487 eliminated genes)
## 
##   Level 11:  437 nodes to be scored  (3539 eliminated genes)
## 
##   Level 10:  637 nodes to be scored  (5458 eliminated genes)
## 
##   Level 9:   807 nodes to be scored  (7267 eliminated genes)
## 
##   Level 8:   820 nodes to be scored  (9052 eliminated genes)
## 
##   Level 7:   784 nodes to be scored  (10815 eliminated genes)
## 
##   Level 6:   656 nodes to be scored  (12126 eliminated genes)
## 
##   Level 5:   416 nodes to be scored  (12854 eliminated genes)
## 
##   Level 4:   226 nodes to be scored  (13470 eliminated genes)
## 
##   Level 3:   102 nodes to be scored  (13659 eliminated genes)
## 
##   Level 2:   22 nodes to be scored   (13757 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13806 eliminated genes)

Biological Process Analysis for DOWN-REGULATED genes: 543 genes

# Wrapper function for topGO analysis (see helper file)
ResBPDownAR <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=BPannAR, ontology='BP', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='BPDownAR', outDir=paste0(SavingFolder)) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11741 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15403 GO terms and 35047 relations. )
## 
## Annotating nodes ...............
##  ( 13971 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 4406 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  9 nodes to be scored    (24 eliminated genes)
## 
##   Level 16:  19 nodes to be scored   (41 eliminated genes)
## 
##   Level 15:  37 nodes to be scored   (122 eliminated genes)
## 
##   Level 14:  61 nodes to be scored   (304 eliminated genes)
## 
##   Level 13:  98 nodes to be scored   (599 eliminated genes)
## 
##   Level 12:  173 nodes to be scored  (1377 eliminated genes)
## 
##   Level 11:  296 nodes to be scored  (3332 eliminated genes)
## 
##   Level 10:  467 nodes to be scored  (5216 eliminated genes)
## 
##   Level 9:   619 nodes to be scored  (6843 eliminated genes)
## 
##   Level 8:   672 nodes to be scored  (8505 eliminated genes)
## 
##   Level 7:   683 nodes to be scored  (10339 eliminated genes)
## 
##   Level 6:   572 nodes to be scored  (12012 eliminated genes)
## 
##   Level 5:   373 nodes to be scored  (12834 eliminated genes)
## 
##   Level 4:   203 nodes to be scored  (13467 eliminated genes)
## 
##   Level 3:   97 nodes to be scored   (13658 eliminated genes)
## 
##   Level 2:   21 nodes to be scored   (13755 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13806 eliminated genes)

# Selection on enrichment of at least 2 is implemented (also to avoid depleted categories). Then categories are ranked by PVal and all the ones with Pval < th are selected. If the number is < minTerms, othter terms are included to reach the minimum number. 
GOTable(ResBPDownAR$ResSel, maxGO=20)

Biological Process Analysis for UP-REGULATED genes: 178 genes

ResBPUpAR <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=BPannAR, ontology='BP', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='BPUpAR', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11741 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15403 GO terms and 35047 relations. )
## 
## Annotating nodes ...............
##  ( 13971 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 3272 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 16:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 15:  11 nodes to be scored   (0 eliminated genes)
## 
##   Level 14:  25 nodes to be scored   (72 eliminated genes)
## 
##   Level 13:  65 nodes to be scored   (334 eliminated genes)
## 
##   Level 12:  115 nodes to be scored  (794 eliminated genes)
## 
##   Level 11:  248 nodes to be scored  (2917 eliminated genes)
## 
##   Level 10:  348 nodes to be scored  (4744 eliminated genes)
## 
##   Level 9:   456 nodes to be scored  (6179 eliminated genes)
## 
##   Level 8:   477 nodes to be scored  (7807 eliminated genes)
## 
##   Level 7:   483 nodes to be scored  (9539 eliminated genes)
## 
##   Level 6:   450 nodes to be scored  (11261 eliminated genes)
## 
##   Level 5:   323 nodes to be scored  (12447 eliminated genes)
## 
##   Level 4:   173 nodes to be scored  (13347 eliminated genes)
## 
##   Level 3:   75 nodes to be scored   (13603 eliminated genes)
## 
##   Level 2:   20 nodes to be scored   (13734 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13795 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/BPUp'), recursive=TRUE)
#GOAnnotation(ResBPUp$ResSel, GOdata=ResBPUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/BPUp'), keytype='SYMBOL')
GOTable(ResBPUpAR$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResBPAllAR$ResSel, TopGOResDown=ResBPDownAR$ResSel, TopGOResUp = ResBPUpAR$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResBPAllAR$ResSel, TopGOResDown = ResBPDownAR$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.3 TopGO analysis: Molecular Function

On the basis of the analysis settings, the enrichment for Molecular Function IS performed.

Molecular Function Enrichment for ALL modulated genes: 721 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
MFannAR <- topGO::annFUN.org(whichOnto='MF', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResMFAllAR <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=MFannAR, ontology='MF', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='MFAllAR', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4161 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4629 GO terms and 5968 relations. )
## 
## Annotating nodes ...............
##  ( 14383 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 920 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  5 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  18 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  16 nodes to be scored   (29 eliminated genes)
## 
##   Level 9:   35 nodes to be scored   (235 eliminated genes)
## 
##   Level 8:   79 nodes to be scored   (1310 eliminated genes)
## 
##   Level 7:   129 nodes to be scored  (3362 eliminated genes)
## 
##   Level 6:   175 nodes to be scored  (4071 eliminated genes)
## 
##   Level 5:   208 nodes to be scored  (5690 eliminated genes)
## 
##   Level 4:   184 nodes to be scored  (8719 eliminated genes)
## 
##   Level 3:   54 nodes to be scored   (11438 eliminated genes)
## 
##   Level 2:   16 nodes to be scored   (12335 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14225 eliminated genes)

Molecular Function Enrichment for DOWN-REGULATED genes: 543 genes

ResMFDownAR <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=MFannAR, ontology='MF', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='MFDownAR', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4161 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4629 GO terms and 5968 relations. )
## 
## Annotating nodes ...............
##  ( 14383 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 771 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  5 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  16 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  15 nodes to be scored   (29 eliminated genes)
## 
##   Level 9:   31 nodes to be scored   (216 eliminated genes)
## 
##   Level 8:   63 nodes to be scored   (1309 eliminated genes)
## 
##   Level 7:   108 nodes to be scored  (3353 eliminated genes)
## 
##   Level 6:   143 nodes to be scored  (4014 eliminated genes)
## 
##   Level 5:   170 nodes to be scored  (5527 eliminated genes)
## 
##   Level 4:   155 nodes to be scored  (8454 eliminated genes)
## 
##   Level 3:   49 nodes to be scored   (11320 eliminated genes)
## 
##   Level 2:   15 nodes to be scored   (12267 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14221 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFDown'), recursive=TRUE)
#GOAnnotation(ResMFDown$ResSel, GOdata=ResMFDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFDown'), keytype='SYMBOL')
GOTable(ResMFDownAR$ResSel, maxGO=20)

Molecular Function Analysis for UP-REGULATED genes: 178 genes

ResMFUpAR <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=MFannAR, ontology='MF', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='MFUpAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4161 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4629 GO terms and 5968 relations. )
## 
## Annotating nodes ...............
##  ( 14383 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 485 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 11:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 10:  4 nodes to be scored    (0 eliminated genes)
## 
##   Level 9:   10 nodes to be scored   (60 eliminated genes)
## 
##   Level 8:   28 nodes to be scored   (1078 eliminated genes)
## 
##   Level 7:   57 nodes to be scored   (2756 eliminated genes)
## 
##   Level 6:   92 nodes to be scored   (3160 eliminated genes)
## 
##   Level 5:   115 nodes to be scored  (4868 eliminated genes)
## 
##   Level 4:   119 nodes to be scored  (7961 eliminated genes)
## 
##   Level 3:   41 nodes to be scored   (10792 eliminated genes)
## 
##   Level 2:   15 nodes to be scored   (12050 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14221 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFUp'), recursive=TRUE)
#GOAnnotation(ResMFUp$ResSel, GOdata=ResMFUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFUp'), keytype='SYMBOL')
GOTable(ResMFUpAR$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResMFAllAR$ResSel, TopGOResDown=ResMFDownAR$ResSel, TopGOResUp=ResMFUpAR$ResSel, 
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResMFAllAR$ResSel, TopGOResDown = ResMFDownAR$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.4 TopGO analysis: Cellular Component

On the basis of the analysis settings, the enrichment for Cellular Component IS performed.

Cellular Component Enrichment for ALL modulated genes: 721 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
CCannAR <- topGO::annFUN.org(whichOnto='CC', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResCCAllAR <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=CCannAR, ontology='CC', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='CCAllAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1698 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1919 GO terms and 3235 relations. )
## 
## Annotating nodes ...............
##  ( 14645 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 607 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  29 nodes to be scored   (54 eliminated genes)
## 
##   Level 10:  67 nodes to be scored   (86 eliminated genes)
## 
##   Level 9:   84 nodes to be scored   (797 eliminated genes)
## 
##   Level 8:   97 nodes to be scored   (2734 eliminated genes)
## 
##   Level 7:   93 nodes to be scored   (4985 eliminated genes)
## 
##   Level 6:   83 nodes to be scored   (8759 eliminated genes)
## 
##   Level 5:   64 nodes to be scored   (10555 eliminated genes)
## 
##   Level 4:   41 nodes to be scored   (12658 eliminated genes)
## 
##   Level 3:   37 nodes to be scored   (13992 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14422 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14570 eliminated genes)

#write.table(ResCCAll$ResAll, file=paste0(SavingFolder, 'TopGO/CCAllResults.txt'), sep='\t', row.names=FALSE)

Cellular Component Enrichment for DOWN-REGULATED genes: 543 genes

# Wrapper function for topGO analysis (see helper file)
ResCCDownAR <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=CCannAR, ontology='CC', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='CCDownAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1698 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1919 GO terms and 3235 relations. )
## 
## Annotating nodes ...............
##  ( 14645 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 517 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  22 nodes to be scored   (54 eliminated genes)
## 
##   Level 10:  53 nodes to be scored   (86 eliminated genes)
## 
##   Level 9:   68 nodes to be scored   (604 eliminated genes)
## 
##   Level 8:   80 nodes to be scored   (2433 eliminated genes)
## 
##   Level 7:   77 nodes to be scored   (4777 eliminated genes)
## 
##   Level 6:   73 nodes to be scored   (8576 eliminated genes)
## 
##   Level 5:   60 nodes to be scored   (10481 eliminated genes)
## 
##   Level 4:   38 nodes to be scored   (12610 eliminated genes)
## 
##   Level 3:   34 nodes to be scored   (13968 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14420 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14570 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCDown'), recursive=TRUE)
#GOAnnotation(ResCCDown$ResSel, GOdata=ResCCDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCDown'), keytype='SYMBOL')
GOTable(ResCCDownAR$ResSel, maxGO=20)

Cellular Component Analysis for UP-REGULATED genes: 178 genes

# Wrapper function for topGO analysis (see helper file)
ResCCUpAR <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=CCannAR, ontology='CC', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='CCUpAR', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1698 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1919 GO terms and 3235 relations. )
## 
## Annotating nodes ...............
##  ( 14645 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 384 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 11:  16 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  37 nodes to be scored   (0 eliminated genes)
## 
##   Level 9:   46 nodes to be scored   (579 eliminated genes)
## 
##   Level 8:   57 nodes to be scored   (1950 eliminated genes)
## 
##   Level 7:   62 nodes to be scored   (4265 eliminated genes)
## 
##   Level 6:   59 nodes to be scored   (8420 eliminated genes)
## 
##   Level 5:   43 nodes to be scored   (10450 eliminated genes)
## 
##   Level 4:   27 nodes to be scored   (12634 eliminated genes)
## 
##   Level 3:   34 nodes to be scored   (13985 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14420 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14570 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCUp'), recursive=TRUE)
#GOAnnotation(ResCCUp$ResSel, GOdata=ResCCUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCUp'), keytype='SYMBOL')
GOTable(ResCCUpAR$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResCCAllAR$ResSel, TopGOResDown=ResCCDownAR$ResSel, TopGOResUp=ResCCUpAR$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResCCAllAR$ResSel, TopGOResDown = ResCCDownAR$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`


7. Savings

Most of the useful information has been saved during the analysis. Here I save figures, workspace and information about the session.

if (params$SaveImages == TRUE){   #Just in case since eval only works when knitting
  #Set the folder paths
  from <- paste(getwd(), paste(params$NbName, 'files/figure-html', sep='_'), sep='/')
  to <- params$FiguresFolder

  #Copy to output directory
  file.copy(from, to, recursive = TRUE, copy.mode = TRUE)
}
ResSelBP_AR <- list(ResSelAll = ResBPAllAR$ResSel,
                    ResSelUp = ResBPUpAR$ResSel,
                    ResSelDown = ResBPDownAR$ResSel)

ResSelMF_AR <- list(ResSelAll = ResMFAllAR$ResSel,
                    ResSelUp = ResMFUpAR$ResSel,
                    ResSelDown = ResMFDownAR$ResSel)

ResSelCC_AR <- list(ResSelAll = ResCCAllAR$ResSel,
                    ResSelUp = ResCCUpAR$ResSel,
                    ResSelDown = ResCCDownAR$ResSel)

ResSel_AR <- list(BP = ResSelBP_AR,
                  MF = ResSelMF_AR,
                  CC = ResSelCC_AR)

saveRDS(ResSel_AR, paste0(SavingFolder, '/day50CbO.', 'ResSel_AR.rds'))
SessionInfo <- sessionInfo()
Date <- date()

save.image(paste0(SavingFolder, '/day50CbO.', 'FunctionalAnalysisWorkspace_AR.RData'))

last update on: Fri Jan 10 20:55:19 2025